Term Importance Degree Impact on Search Result Clustering

نویسندگان

  • Soheila Karbasi
  • Mehdi Yaghoubi
  • H. J. Zeng
  • Q. C. He
  • Z. Chen
  • W. Y. Ma
  • J. X. Yu
  • X. Lin
  • H. Lu
  • C. D. Manning
  • P. Raghavan
چکیده

As wellactual clustering algorithms have to deal with explosive growth of documents of various sizes and terms of various frequencies, an appropriate term-weighting scheme has a crucial impact on the overall performance of such systems. Term-weighting is one of the critical process for document retrieval and ranking in most search result clustering systems. In this paper we introduce a new technique forclustering algorithms that solve the problem of indexing the terms of big datasets and their characteristicswhich exist in most of current clustering approaches. The paper focus on term frequency normalization step ofclustering algorithms. Anew factor has been applied tobasic term-weighting schemes for using in clustering process. The evaluated results confirm the impact of this factor to increase the performance of clusteringtechniques. The experiments were carried out on the standard algorithms and ODP-239 datasets which validated by statistical tests.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Experiments in Document Clustering using Cluster Specific Term Weights

We study methods to initialize or bias different clustering methods using prior information about the “importance” of a keyword w.r.t. to the specific clusters. These studies give us hints on how to initialize clustering methods in order to improve the clustering performance if prior knowledge is available. This can be especially useful if a user-specific clustering of a document collection or ...

متن کامل

Efficient Clustering Multiple Web Search Engine Results and Ranking

World Wide Web is considered the most valuable place for Information Retrieval and Knowledge Discovery. Web search engines with effective and efficient techniques for Web service retrieval and selection becomes an important issue. Existing web search result based on keyword matching in single search engine only. This paper details a modular, self-contained web search results clustering system t...

متن کامل

Experiments in Term Weighting and Keyword Extraction in Document Clustering

We study methods to initialize or bias different clustering methods using prior information about the “importance” of a keyword w.r.t. the whole document collection or a specific cluster. These studies give us hints on how to initialize clustering methods in order to improve performance if prior knowledge is available. This can be especially useful if a user-specific clustering of a document co...

متن کامل

Improving Retrieval Performance with Positive and Negative Equivalence Classes of Terms

One of the most pressing problems facing application developers in the area of information retrieval (IR) is the lack of sound mathematical, theoretical frameworks for understanding IR [SIGIR2000]. Although many such frameworks have been proposed, in the final analysis none has been sufficiently well-grounded to attain widespread acceptance in the field. In addition, there is all too often a la...

متن کامل

The impact of network characteristics on the diffusion of innovations

This paper studies the influence of network topology on the speed and reach of new product diffusion. While previous research has focused on comparing network types, this paper explores explicitly the relationship between topology and measurements of diffusion effectiveness. We study simultaneously the effect of three network metrics: the average degree, the relative degree of social hubs (i.e....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014